Submitted to the Annals of Statistics 



FUNCTIONAL REGRESSION FOR GENERAL EXPONENTIAL 

FAMILIES 

By Wei Dou*, David Pollard* and Harrison H. Zhou* 

Yale University 

The paper derives a minimax lower bound for rates of convergence for 
an infinite-dimensional parameter in an exponential family model. An esti- 
mator that achieves the optimal rate is constructed by maximum likelihood 
on finite-dimensional approximations with parameter dimension that grows 
with sample size. 

1. Introduction. Our main purpose in this paper is to extend the theory devel- 
oped by Hall and Horowitz (2007) — for regression with mean a linear functional of 
an unknown square integrable function B defined on a compact interval of the real 
line — to observations from an exponential famly whose canonical parameter is 
of the form B(i)Xj(i) dt for observed Gaussian processes Xj. 

Our methods introduce several new technical devices. We establish a sharp ap- 
proximation for maximum likelihood estimators for exponential families parametr- 
ized by linear functions of m-dimensional parameters, for an m that grows with 
sample size. We develop a change of measure argument — inspired by ideas from 
Le Cam's theory of asymptotic equivalence of models — to eliminate the effect of 
bias terms from the asymptotics of maximization estimators. And we obtain im- 
proved bounds for projections onto subspaces defined by eigenfunctions of pertur- 
bations of compact operators, bounds that simplify arguments involving estimates 
of unknown covariance kernels. 

More precisely, we consider problems where the observed data consist of inde- 
pendent, identically distributed pairs (yj, Xj) where each X, is a Gaussian process 
indexed by a compact subinterval of the real line, which with no loss of generality 
we take to be [0, 1]. We write m for Lebesgue measure on the Borel sigma-field 
of [0, 1]. We denote the corresponding norm and inner product in the space £ 2 (m) 
by || • || and (-,-). 

We assume the conditional distribution of yi given the process X, comes from 
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an exponential family {Q\ : A G R} with parameter 

(1) \ i = a + [ Xj(i)B(i) dt 

Jo 

for an unknown constant a and an unknown B € £ 2 (m). 
We focus on estimation of B using integrated squared error loss: 

L(B,B n ) = ||B -B n || 2 = J (m(t) -3 n (t)y dt. 

In a companion paper we will show that our methods can be adapted to treat 
the problem of prediction of a linear functional f x(t)M(t) dt for a known x, ex- 
tending theory developed by Cai and Hall (2006). In that paper we also consider 
some of the practical realities in applying the results to the economic problem of 
predicting occurence of recessions from the U.S. Treasury yield curve. 

Our models are indexed by a set "5 of parameters / = (a, B, K, p), where p is 
the mean and K is the covariance kernel of the Gaussian process. Under assump- 
tions on 3" (see Section 3) analogous to the assumptions made by Hall and Horowitz 
(2007) for a problem of functional linear regression, we find a sequence {p n } that 
decreases to zero for which 

(2) lim inf sup P n f \ | B — B n 1 1 2 / p n > for every estimating sequence {B n } 

n->oo /E gr 

and construct one particular estimating sequence of B n 's for which: for each e > 
there exists a finite constant C e such that 

(3) supP n /{||B - B n || 2 > C e p n } < e for large enough n. 

For the collection of models 3" = 3~(i?, a, /3) defined in Section 3, the rate p n 
equals n ^- 2 ^^ a+2 ^. 

In Section 9 we establish a minimax lower bound by means of a variation on 
Assouad's Lemma. 

We begin our analysis of the rate-optimal estimator in Section 4, with an approx- 
imation theorem for maximum likelihood estimators in exponential family mod- 
els for parameters whose dimensions change with sample size. The main result is 
stated in a form slightly more general than we need for the present paper because 
we expect the result to find other sieve-like applications. The approximations from 
this section lie at the heart of our construction of an estimator that achieves the 
minimax rate from Section 9. 

As an aid to the reader, we present our construction of the estimating sequence 
for (3) in two stages. First (Section 5) we assume that both the mean p and the 
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covariance kernel K are known. This allows us to emphasize the key ideas in our 
proofs without the many technical details that need to be handled when fi and K 
are estimated in the natural way. Many of those details involve the spectral theory 
of compact operators. 

We have found some of the results that we need quite difficult to dig out of 
the spectral theory literature. In Section 6 we summarize the theory that we use to 
control errors when approximating K: some of it is a rearrangement of ideas from 
Hall and Horowitz (2007) and Hall and Hosseini-Nasab (2006); some is adapted 
from the notes by Bosq (2000) and the monograph by Birman and Solomjak (1987); 
and some, such as the material in subsection 6.3 on approximation of projections, 
we believe to be new. 

Armed with the spectral theory, we proceed in Section 7 to the case where \i 
and K are estimated. We emphasize the parallels with the argument for known [i 
and K, postponing the proofs of the extra approximation arguments (mostly col- 
lected together as Lemma 28) to the following section. 

The final two sections of the paper establish a bound on the Hellinger distance 
between members of an exponential family, the key to our change of measure ar- 
gument, and a maximal inequality for Gaussian processes. 

2. Notation. For each matrix A, the spectral norm is defined as ||^4||2 := 

sup| n j <1 \ Au\ and the Frobenius norm by \\A\\f ■= [Ylt j^ij) -If A is sym- 
metric, with eigenvalues Ai, . . . , A&, then 

||-A||2 = maxj |Aj| = sup| u | <:L |u'Au| < ||j4||f- 

If A is also positive definite then the absolute values are superfluous for the first 
two equalities. 

When we want to indicate that a bound involving constants c, C, C\ , . . . holds 
uniformly over all models indexed by a set of parameters 3", we write c(9~), C(3~), 
Ci(3~), .... By the usual convention for eliminating subscripts, the values of the 
constants might change from one paragraph to the next: a constant C\ (3~) in one 
place needn't be the same as a constant Ci(3~) in another place. 

For sequences of constants c n that might depend on 3~, we write c n = Oj(l) 
and oj(l) and so on to show that the asymptotic bounds hold uniformly over 3~. 

We write h(P, Q) for the Hellinger distance between two probability measures P 
and Q. If both P and Q are dominated by some measure v, with densities p and q, 
then h 2 (P, Q) = v (y/p- y/q) . We use Hellinger distance to bound total varia- 
tion distance, 

||-P-Q||tv :=sup A |P^-Q^| = \v\p-q\ < h(P,Q). 
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For product measures we use the bound 

h 2 {®i <n Pi,® i<n Qi) < VV h 2 (Pi,Qi). 

To avoid confusion with transposes, we use the dot notation or superscript nota- 
tion to denote derivatives. For example, •ijj or both denote the third derivative 
of a function ip, 

3. The model. Let {Q\ : A G R} be an exponential family of probabil- 
ity measures with densities dQ\/dQo = f\(y) = exp (Ay — ^(A)). Remember 
that e^( A ) = Qoe Xy and that the distribution Q\ has mean ^>( 1 )(A) and vari- 
ance tp( 2 \\). 

We assume: 

(?/>3) There exists an increasing real function G on R + such that 

\ip {3) {X + h)\ < V (2) (A)G(|/i|) for all A and h 

Without loss of generality we assume G(0) > 1. 
(i/j2) For each e > there exists a finite constant C e for which ip( 2 > (A) < C e exp(eA 2 ) 
for all A G R. Equivalently, ipW(X) < exp (o(A 2 )) as |A| — > oo. 

As shown in Section 10, these assumptions on the ip function imply that 

(4) h\Q x ,Qx+8) <<5 2 V (2) (A) (1 + \5\)G(\5\) for all A, 5 G R. 

Remark. We may assume that ijA 2 \\) > for every real A. Otherwise we 
would haveO = ^> (2) (Ao) = var Ao (y) = ^fx {y){y-^ l - 1) {Xo)) 2 for some A , 
which would make y = ijj 1 ^ (Ao) for v almost all y and Q\ = Q\ for every A. 

We assume the observed data are iid pairs (yi, Xj) for i = 1, . . . , n, where: 

(a) Each {X;(t) : < t < 1} is distributed like {X(t) : < t < 1}, a Gaussian 
process with mean fj,(t) and covariance kernel K(s, t). 

(b) yi | Xj ~ Q Xt with Xi = a + (Xj,B) for an unknown {B(i) : < i < 1} in 
£ 2 (m) and a E R. 

Definition 5. For r<?a/ constants a > 1 a«J /3 > (a + 3)/2 a«<i > 0, 
define 3" = a, /3) as ?/ze set of all f = (a, B, /i, if) f/ia? satisfy the following 
conditions. 

(K) The covariance kernel is square integrable with respect to m m and has an 
eigenfunction expansion (as a compact operator on £ 2 (m)) 

where the eigenvalues 6 & are decreasing with Rk~ a > Of. > 9k + i + (a/ R)k~ a ~ 
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(a) \a\ < R 

(n) IHI < R 

(M) B has an expansion B(t) = ^fceN bk<fik(t) with \bk\ < Rk~^,for the eigen- 
functions defined by the kernel K. 

Remarks. The awkward lower bound for Ok in Assumption (K) implies, for 
all k < j, 

(6) e k - 6j > R- 1 J' ax-°- x dx = iT 1 (fc~ Q - r a ) . 

If K and /i were known, we would only need the lower bound Ok > i? _1 fc~ Q 
and not the lower bound for 8k — Ok+i- As explained by Hall and Horowitz 
(2007, page 76), the stronger assumption is needed when one estimates the in- 
dividual eigenfunctions of K. Note that the subset T>k of £ 2 (m) in which B 
lies depends on K. We regard the need for the stronger assumption on the 
eigenvalues and the irksome Assumption (B) as artifacts of the method of 
proof, but we have not yet succeeded in removing either assumption. 

More formally, we write P^k f° r the distribution (a probability measure on L 2 (m)) 
of each Gaussian process Xj. The joint distribution of Xi, . . . , X n is then P n ,^,K = 
P™ K . We identify the y/s with the coordinate maps on W 1 equipped with the prod- 
uct measure Qn,a,B,Xi,...,x„ := ®i<nQ\, which can also be thought of as the 
conditional joint distribution of (yi, . . . , y n ) given (Xi, . . . , X n ). Thus the P n j 
in equations (2) and (3) can be rewritten as an iterated expectation, 

^n,f = IP > n,^,ii'Q?i,a,B,Xi,...,X n! 

the second expectation on the right-hand side averaging out over y\ , . . . , y n for 
given Xi, . . . , X n , the first averaging out over Xi, . . . , X n . 

To simplify notation, we will often abbreviate Q n ,o,B,x 1 ,...,x n to Q n ,a,B- 

4. Maximum likelihood estimation. The theory in this section combine ideas 
from Portnoy (1988) and from Hjort and Pollard (1993). We write our results in a 
notation that makes the applications in Section 5 and 7 more straightforward. The 
notational cost is that the parameters are indexed by {0,1,..., N}. To avoid an 
excess of parentheses we write N + for iV + 1. In the applications N changes with 
the sample size n and Q is replaced by Qn,a,B,7V or Qn oB jv- 

Suppose £i, . . . , £ n are (nonrandom) vectors in R N+ . Suppose Q = ®i< n Q\ t 
with Xi = ^'ff for a fixed 7 = (70, 71, . . . , 77V ) in R N + . Under Q, the coordinate 
maps yi , . . . , y n are independent random variables with y\ ~ Q\ t . 

The log-likelihood for fitting the model is 

Ln(g) = V ^ (fa)vi - t/>(£g) for g G R N + , 
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which is maximized (over M. N +) at the MLE g (= g n ). 

Remark. As a small amount of extra bookkeeping in the following argument 
would show, we do not need g to exactly maximize L n . It would suffice to have 
L n (g) suitably close to sup g L n (g). In particular, we need not be concerned 
with questions regarding existence or uniqueness of the argmax. 

Define 

(i) Jn = Ei< n &^ (2) (Ai). an N + x N + matrix 

(ii) Wi := J n 1 ^ 2 £,i, an element of R N + 

(iii) W n = J2i< n w i {Vi ~ an element of R N + 

Notice that QW n = and vaiQ(W n ) = X/j<« w i w i 

^ 2 ){\ i ) = I N+ and 
Q|W^| 2 = trace (var Q (W n )) = N+. 
LEMMA 7. Suppose < ei < 1/2 and < e 2 < 1 and 

maxj<„ \ wi\ < with G as in Assumption (ip3). 

Then # = 7 + J^" 1 ^ 2 (W n + r n ) with \r n \ < t\ on the set {\W n \ < y / N + /e 2 }, 
which has ^-probability greater than 1 — e 2 . 



PROOF. The equality Q|W n | 2 = N + and Tchebychev give Q{|W„| > y/N + /e 2 } < 

1 /2 

Reparametrize by defining t = J n ((7 — 7). The concave function 

L n (t) := L n ( 7 + J- l 'H) - L n ( 7 ) = V yXt + HK) ~ + u#) 

1/2 

is maximized at t n = J n ' (5 — 7). It has derivative 



For a fixed unit vector u G R + and a fixed i G R + , consider the real-valued 
function of the real variable s, 

H(s) := u't n (st) = ^2 l<n u ' w i {Vi ~ ^ (1) ( A * + s ^)) > 
which has derivatives 

H(s) = -V {u'wMd^^iXi + sw-t) 
= - E < («'«>*) («#)V (3) (Ai + s «#). 



imsart-aos ver. 2009/12/15 file: FunctionalRegression.tex date: 21 January 2010 



FUNCTIONAL REGRESSION 



Notice that H(0) = u'W n and H(0) = -u' Y,i< n wiw'^ (Aj)t = -n't. 
Write M n for maxj< n \ wi\. By virtue of Assumption (ipS), 



\H(s)\ <J2. WwiKwitf^iX^Gilswitl) 
<M n G{M n \st\)t'Y Wiw'^iX^t 
= M n G{M n \st\) \t\ 2 . 
By Taylor expansion, for some < s* < 1, 

\H(1)-H(0)-H(0)\ < l\H(s*)\ < \M n G{M n \t\)\t\ 

That is, 

u' (t n (t)-W n + t) <\M n G{M n \t\)\t\ 2 . 



(8) 

Approximation (8) will control the behavior of L(s) := £ n (W n +su), a concave 
function of the real argument s, for each unit vector u. By concavity, the derivative 

Z(s) = u't n (W n + su) = -s + R{s) 
is a decreasing function of s with 

\R{s)\ < \M n G (M n \W n + su\) \W n + su\ 2 



On the set {\W n \ < y / N + /e 2 } we have 



Thus 



implying 



\W n ±e lU \ < y /N + /e 2 + e 1 . 
M n \W n ±e lU \ < 2G e ^ N+ [y/N^ + ex) < 1, 



|12(±€i)| < ±M n G(l)\W n ±e lU \ 2 

< ei {l + ele 2 /N + ) < fei. 



Deduce that 



£(ei) = -e x + i?(ei) < -fei 
Z(-ei) = e x + R(-ei) > fei 

The concave function s h-» £i n (W n + su) must achieve its maximum for some s in 
the interval [— e±, ei], for each unit vector u. It follows that |i n — W n | < e%. □ 
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COROLLARY 9. Suppose & = Drjifor some nonsingular matrix D, so that 
J n = nDA n D where^„ := - V 7?i^ {2) (Ai). 
ijf-B n another nonsingular matrix for which 

(10) Pn-5 n || 2 < (2|| J B- 1 || 2 )- 1 
owe? /f 

eJn/N + 

(11) maxj<„ |r/j| < — for some < e < 1 



G(1)J32||^ 1 || 2 



then for each set of vectors Kq, . . . , kn in R + f/iere a sef y Kj£ w/f/j QVJj.e < 2e 
on which 



6\\B, 



-li 



^o<j<N 1 ne *-^o<j<N 1 

Remark. For our applications of the Corollary in Sections 5 and 7, we need 
D = diag(Z?07 -Di, . . . , Dm) and k 3 = ej, the unit vector with a 1 in its jth 
position, for j < m and Kj = for j > m. In our companion paper we will 
need the more general k/s. 

Proof. First we establish a bound on the spectral distance between A^ 1 and B~ l . 
Defined = B^A^—I. Then||iJ|| 2 < H-B^jy^n-Bnlb < 1/2, which justifies 
the expansion 



\A: 



^-B-'h = II {(I + H)- 1 - I) B-% < J2. >t ll^ll'll^lb < \\B-%. 



3- 

-1| 



As a consequence, 1 1 1 1 1 2 < 2[|S n -||2. 

Choose ei = 1/2 and e 2 = e in Lemma 7. The bound on maxj<„ |r/j| gives the 
bound on maxj< ra | Wi | needed by the Lemma: 

n\ Wi \ 2 = 4D(J n /n)- 1 Dr ]i = ^A' 1 ^ < \\A~ l \\ 2 \Vi\ 2 - 

Define Kj := J~ 1/2 kj, so that |^(?- j)\ 2 < 2{K' j W n ) 2 + 2{K' j r n ) 2 . By 
Cauchy-Schwarz, 

^(K'jrn) 2 < Y Jj \Kj\ 2 \r n \ 2 = U K \r n \ 2 

where 

U K := V dJ^Kj = V n-^D-^jYA^D- 1 ^ 
<2n- 1 \\B-%^\D- 1 K 3 \ 2 . 
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For the contribution V K := ^ ■ \K'jW n \ 2 the Cauchy-Schwarz bound is too crude. 
Instead, notice that QV K = U K , which ensures that the complement of the set 



Vk* :={\W n \< VTU/e} n {V K < Uje] 
has Q probability less that 2e. On the set y K)£ , 

The asserted bound follows. □ 

5. Known Gaussian distribution. Initially we suppose that fj, and K are known. 
We can then calculate all the eigenvalues Ok, the eigenf unctions 4>k for K, and the 
coefficients z^k ■= (Zj, for the expansion 

Xi-H = Zi = J2 k&i Zi tk <l> k . 

The random variables Z{ k are independent with z^k ~ -/V(0, 0^). The random vari- 
ables r/i ; fc := Zi^/VOk are independent standard normals. 

Under Q n = Q n) a,B, the yfs are independent, with y^ ~ Qa; and 

Ai = a + (Xi, B) = 6 + 5Z fceN where b = a + (fi, B). 

Our task is to estimate the 6fc's with sufficient accuracy to be able to estimate 
B (*) = J2keN b k(pk(t) within an error of order p n = n ^- 2 ^^ a+2 ^. In fact 
it will suffice to estimate the component H m M of B in the subspace spanned by 
{4>i, . . . , 4>m\ with m ~ n 1 ''""*" 2 ^ because 

(12) ||i^B|| 2 = V bt = 0?(m l - 2 P) = 0?(p n ). 

*sk>m 

We might try to estimate the coefficients {bo, . . . , 6 m ) by choosing g — (go, • • • j 9m 
to maximize a conditional log likelihood over all g in M m+1 , 

E. . 2/iAi m - V(Ai, m ) with Aj m = 5o + . ^.fefffc- 

To this end we might try to appeal to Corollary 9 in Section 4, with kj equal to the 
unit vector with a 1 in its jth position for j < m and Kj = otherwise. That would 
give a bound for ^2j <m {dj — 7j) 2 - Unfortunately, we cannot directly invoke the 
Corollary with TV = m to estimate 7 = (60, 61, ... , 6 at) when 

Q = Qn,a,M and D = diag(l,0 1 ,...,0 7 v) 1/2 

(13) & = (I, z i}1 , . . . , z ijN ) and 77- = (1, 77^1, . . . ,r) ijN ) 

because Aj 7^ #7. 
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Remark. We could modify Corollary 9 to allow £ t — £-7 + biasj, for a 
suitably small bias term, but at the cost of extra regularity conditions and a 
more delicate argument. The same difficulty arises whenever one investigates 
the asymptotics of maximum likelihood with the true distribution outside the 
model family. 

Instead, we use a two-stage estimation procedure that eliminates the bias term 
by a change of measure. Condition on the Xj's. Consider an N much larger than m 
for which 

N^n^ with (2 + 2a)" 1 >C> (a + 2/3-1)" 1 , 

Such a C exists because the assumptions a > 1 and (3 > (a + 3)/2 imply a + 2/3 — 
1 > 2 + 2a. Define £j, D, and rji as in equation (13). For Q use the probability 
measure 



En,aM,N 



i<nQ\ % N with \ i>N := ^7 and 7' = (b , h, ... , b N ). 



Choose B n := F n>lx>K A n . Define X n = X z , n n X V)n n X^n. where 
(14) Xi, n : = {maxj<„ ||Zj|| 2 < C logn} 



(15) 

(16) Xa,u 



{maxj< n \r]i\ 2 < C iVlogn} 

{Pn- J Bn||2<(2||S- 1 || 2 )~ 1 } 



If we choose a large enough constant Co = Co (3"), Lemma 41 and its Corollary in 
Section 11 ensure that P n .^,ifX| n < 2/n and Pn^ifXS n < 2/n; and in subsec- 
tion 5.1 we show that 

\\B-% = Oy(l) and Pn, M> ifPn - 5 n ||l = Oy(l). 

Thus P n ,/i,A"X^ = oj(l). Moreover, on the set X n , inequality (10) holds by con- 
struction and inequality (11) holds for large enough n because 

max,< n \rn\ 2 < Oj(Nlogn) = oy(y/n/N). 

Estimate 7 by the g = (go, • • • ,9n) defined in Section 4. Then discard most of 
the estimates by defining M n := Ei<Km*^' For eacri realization of the Xj's 
in X n , the Lemma gives a set y mj(E with Qn^fi^^m^ < 2e on which 

E 1<K \9k ~ 7fc| 2 = O ff (V = 0?(m 1+a /n) = ? (Pn), 

which implies 

||B n -JB|| 2 = V |^-7 fc | 2 +V b 2 k = Oy( Pn ). 

* — 'Kfe<m * — 'k>m 
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In replacing Q n; a ; B by Q n ,a,M,N we eliminate the bias problem but now we have 
to relate the probability bounds for Q n ,a,M,N to bounds involving Q n)0> B. As we 
show in subsection 5.2, there exists a sequence of nonnegative constants c n of 
order oj(log n), such that 

(IV) ||Qn,a,B-Qn,a,B,iv||TV < ^ V" | Ai — A i;iV | 2 On X n . 

* — 4 %<n 

From this inequality it follows, for a large enough constant C e , that 

Pn,^Qn,a,B{||B n -B|| 2 > C ePn } 

{\\Qn,a,M - Qn,a,B,iv||TV + Qn,a,M,N^m,e) 
1 /2 

< ogr(l) + 2e + e c " (^ <n F n^,K\ A, - A ii7V | 2 ) . 
By construction, 

Aj — Aj jv = / Ar Zi,kbk 
with the z^fc's independent and ^ ~ iV(0, 0fc). Thus 

V Pn.^lA.-A^l^nV ^,6| = 7 (niV 1 - a -^) = 0? (e- 2c ") 

because ( > (a + 2/3 — l) -1 . That is, we have an estimator that achieves the 
Oj{p n ) minimax rate. 

5.1. Approximation of A n . Throughout this subsection abbreviate ^ n ,n,K to P. 
Remember that 

A n = n~ i y^ mVi^H^N) with X iN = ^Dr]i, 

where 

V = (o,6i, ... ,6jv) 

J D = diag(l,v / ^i,...,y^) 

With £ n = ¥A n , we need to show \\B- l \\ 2 = Oj-(l) and Pp n - = oj-(l). 

The matrix A n is an average of n independent random matrices each of which is 
distributed like ]WV {2) (V-DN), where 3M* = (N ,Xi, . . . ,N N ) with X = 1 and 
the other Nj's are independent iV(0, l)'s. Moreover, by rotational invariance of the 
spherical normal, we may assume with no loss of generality that = a + nNi, 

where 
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Thus 

B n = PNNV (2) (a + = diag(F,r /jv-i) 



where 



r n 
n r 2 



r,- : = PX{ ^ (2) (o + kNi ) and F 

The block diagonal form of B n simplifies calculation of spectral norms. 

||B- 1 ||2 = ||dMg(F- 1 ,ro%_i)|| 2 

< max [\\F 2 , Fo ijv-i 2) < max 2,r 

Assumption (^2) ensures that both ro and r 2 are Oj(l). 

Continuity and strict positivity of together with max(\a\, k) = Oj(l), 
ensure that cq := inf^ mfui<i ^ 2 \a + kx) > 0. Thus 

»+i 



e _:r /2 dx > 



Similarly 



2 



27r(r r 2 - r\) = \/2^r PV> (2) (a + kNx)(Xi - n/r ) 

> coTq / (x — ri/ro) 2 e~ x ' 2 dx > c^r^ I x 2 e~ x ' 2 dx. 

It follows that WB-% = Of (I). 

The random matrix A n — B n is an average of n independent random matrices 
each distributed like r NN'ifj( 2 > (a + «!Ni) minus its expected value. Thus 

F\\A n - BJ 2 < ¥\\A n - B n \\ 2 = n" 1 X)o< ifc <jv var (^'^ {2) (7™)) • 

Assumption (tp2) ensures that each summand is Oj-(l), which leaves us with a 
Oj{N 2 /n) = 05(1) upper bound. 

5.2. Total variation argument. To establish inequality (17) we use the bound 

\\®n,a,n-Qn,a,m\&V < & (Qn.a.B, Qn,afi) < Y] h 2 (Q Xi ,Q XiN ) 

* — "i<.n 

By Lemma 4 

h 2 (Q^,Q^ N ) < S^ 2 H\i) (1 + |4|) 0(1*1) 
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where 



1*1 = |A< - Xi, N \ = |<Zi,B) - (H N Z U M)\ 
= \(Zi,H^M}\ 

< ||Zj|| ||-ffjyIB|| 

< logn ) 

= oy(l) 

Thus all the (1 + <7(|<5j|) factors can be bounded by a single Oj-(l) term. 
For (a,B, /x, if) G 3(R,a,(3) and with the ||Zj||'s controlled by X n , 



\K\ < \a\ + (IHI + ||Zi||)||B|| < C^Vlog 



n 



for some constant C2 = CzifF). Assumption (^2) then ensures that all the ip^ (Aj) 
are bounded by a single exp (oy(log n)) term. 

6. Approximation of compact operators. Suppose T is a positive, (self-adjoint) 
compact operator on a Hilbert space "K with eigenvectors {e^} and eigenvalues {9 k}- 
That is, Te{ = 6{ei with Q\ > 62 > ■ ■ ■ > 0. For each x in "K, 

a series Jhat converges in operator norm. 

Let T be another positive, (self-adjoint) compact operator on "K with corre- 
sponding representation 

Define A := T — T and <5 = || A||. The operator T also has a representation 
(18) f = ^2 j>k&i T j>k e j ^e k . 

Note that Tj ^ = Tj^ because T is self-adjoint. This representation gives 

A = E,- fc6N - = e ^ ® ek 

and 

||A|| 2 = sup b || =1 (x, Ax) 2 < J2 jk& (r i>fc - = /c}) 2 . 

The last inequality will lend itself to the calculation of the expected value of || A|| 2 
when T is random, leading to probabilistic bounds for 5. 
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In this section we collect some general consequences of 5 being small. In the 
next section we draw probabilistic conclusions when T is random, for the special 
case where T = K and T = K, the usual estimate of the covariance kernel, both 
acting on J{ = £ 2 (m). The eigenvectors will become eigenfunctions (pi, fa,. . . 
and 0i j 02 j We feel this approach makes it easier to follow the overall argu- 
ment. 

Both {ej : j € N} and {e k : k G N} are orthonormal bases for JC. Define 
<?j,k '■= {ej,e k ). Then 

and 

{3=3} = (ej,e f ) = ^2 keN °j,kVj>,k- 

6.1. Approximation of eigenvalues. The eigenvalues have a variational charac- 
terization (Bosq, 2000, Section 4.2): 

(19) 6a = inf sup{(x,Tx) : x _L L and llxll = 1). 

dim(L)<j 

The first infimum runs over all subspaces L with dimension at most j — (When j 
equals 1 the only such subspace is 0.) Both the infimum and the supremum are 
achieved: by Lj-\ = spanjej : 1 < % < j} and x = ej. Similar assertions hold 
for T and its eigenvalues. 
By the analog of (19) for f, 

8j > sup{(x,Tx) : x _L Lj-i and \\x\\ = 1} 

> sup{(x, Tx) — 5 : x _L and ||x|| = 1} = 9j — 5. 

Argue similarly with the roles of T and T reversed to conclude that 

(20) \6j-9j\<8 foralljGN. 

6.2. Approximation of eigenvectors. We cannot hope to find a useful bound 
on ||3fc — efc||, because there is no way to decide which of zte^ should be approxi- 
mating e&. However, we can bound where 

t ■ ( \ J + 1 iffJ M>0 

fk = °~kek ~ e k with a k := sign {a k k ) := < . , 

1—1 otherwise 

which will be enough for our purposes. 
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We also need to assume that the eigenvalue 8 k is well separated from the other Oj's, 
to avoid the problem that the eigenspace of T for the eigenvalue 9 k might have di- 
mension greater than one. More precisely, we consider a k for which 

e k := mm{\9j - 9 k \ : j / k} > 55, 

which implies 

\0k - 0j\ > \0k - 0j\ — S > i\&k - 0j\ > |efe- 
The starting point for our approximations is the equality 
(21) (Ae k , ej) = (fe k , e 3 ) - (e k ,Tej) = (6 k - 9j)a jjk . 

For j / k we then have 

^(0 k - 9 3 fa\ k < (a k Ae k ,e 3 ) 2 < 2(Af k , ej } 2 + 2(Ae k ,e 3 } 2 , 

which implies 

25 ~ 
a\ k < — (Af k , ej ) 2 /e 2 k + 2T 2 k /(9 k - 9 3 f because (Te k , e 3 ) = for j ± k. 

To simplify notation, write ^* for X^jsnO' ^ ^}. 
The introduction of the a k also ensures that 

||/fc|| 2 = ||efc|| 2 + ||e fc || 2 - 2a k {e k ,e k ) =2 - 2\a k , k \ 
< 2 — 2a\ k because \a k>k \ < 1 

£ E* t (AA ' e ' }2/4 + f E* -"if- 

The first sum on the right-hand side is less than 




A/ fc || 2 /6l<||A|| 2 ||/ fc || 2 /(45 2 )<||/ fc || 2 /4. 



The second sum can be written as 25||Afc|| 2 /4 for 

A fc := Y A kj e 3 with A kj := J f ** /Wfc " ° j) * j * 
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Our bound for ||/fc|| 2 (with an untidy 25/3 increased to 9) then takes the convenient 
form 

(22) ||M| 2 < 9||A fc || 2 if e k > 5||A||. 

For our applications, P||Afc|| 2 will be of order 0(k 2 /n). 

When 5 is much smaller than e k we can get an even better approximation for f k 
itself. Start once more from equality (21), still assuming that e k > 55. For j ^ k, 

(?kVj,k = v k {Ae k , ej)/(9 k - 9j) 

= (A(e k + fk),ej)/(9 k + j k - dj) where j k = 9 k - 9 k 

= A k,j ( 1 " TT^Vl + { ~ fk,ej) because (Te k , ej) = 
V 9j-9 k J 9 k -9j 

\-e k . (Af k , ej ) 



Afcj + r k> j where r kJ := T-A k ,j + 



9 k — 9j 
The r k , 's are small: 

(23) < ^^ A ^ by inequality (22). 

Define r fcifc = |o- fe)jfc | - 1 = -|||/fc|| 2 and r k = Y^jeS r kJ e j- We tnen have a 
representation (cf. Hall and Hosseini-Nasab, 2006, equation 2.8 and Cai and Hall, 
2006, §5.6) 



(24) f k = a k e k - e k = (a k (e k ,e k ) - 1) e k + ^ a k a jjk ej = A k + 



r k . 



6.3. Approximation of projections. The operator Hj = Y^keJ ek ® ek P r °j ects 
elements of Ji orthogonally onto spanje^ : k G J}; the operator Hj = ^fcej^fc® 
e k projects elements of Ji orthogonally onto spanje^ : k G J}. We will be inter- 
ested in the case J = {1,2, ... ,p} with p equal to either the m or the N from 
Section 5. In that case, we also write H p and H p for the projection operators. 

In this subsection we establish a bound for ||-ff/B — £f/B|| for a B = ^ ■ bjej 
in "K. 
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The difference Hj — Hj equals 

E, i^A) ® (o-fcefe) - e k e fc 
fee J 

= Ofeefe r fc + ^ (e fc + / fe ) A fc 

+ E, 7 (( e fc + A fe + r fc ) (g) e fc - e fc ® e fc ) 
= ftj + V e fc O Afe + A* ® e fc 

' — 'fee J 

where 3?j := V] cr fc e fc r fc + / fc A fc + r k e k . 
* — 'fee J 

Self-adjointness of T implies Tj = T k j and hence Ay- & = — A,- The anti- 
symmetry eliminates some terms from the main contribution to Hj — Hj\ 

V\ e k ® Afc + A fc ® e fc = V". A fc j (e fc ej + e fc ) . 

With this simplification we get the following bound for ||(-Hj — Hj)M\\ 2 : 

3 ii E fceJ e * E ieJC A *,;M 2 + an E ieJe ^ Efc GJ a *aii 2 + 3 ii^ B n 2 

The first two sums contribute 

3 Efc, 7 (E, e Jc A *,A) 2 + 3 ^ jb ( £ fee j Afc,6fc) 2 

In the next section the expected value of both sums will simplify because FA k jA k ji 
will be zero if j ^ j'. 

For the three contributions to the bound for ||3?jB|| 2 we make repeated use of 
the inequality, based on equations (22) and (23), 

|(rfc,x)| < — ||Afc|| 2 |xfc| + 5J||Afc|| V. 31 

l *-^3 \V k — Vj\ 



which is valid whenever e k > 55. To avoid an unnecessary calculation of precise 
constants, we adopt the convention of the variable constant: we write C for a uni- 
versal constant whose value might change from one line to the next. The first two 
contributions are: 



Efc eJ ^^> B >ii 2 = Efc e >' B > 2 

<cy 6 2 nAfcii 4 + c5 2 v \\A k f(y*-M 
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||^ e; A(A t ,B>ll 2 < (E leJ IIAIIK A *.»>l) 2 

^(E«ii^ii 2 )E«(E>^) 2 . 

For the third contribtition, let x — J2j x i e j he an arbitrary unit vector in J{. Then 
< C (£ IEJ |h«|||A t ||f + C5 2 (J2 kej \\A t \\ h 

* c (E le ., miim 2 ) 2 + « 2 (E t6J i*f) E^* (e; ^) 

take the supremum over x, which doesn't even appear in the last line, to get the 
same bound for || J2keJ ^fc r fcl| 2 - 

In summary: if min^gj e& > 5<5 then || (Hj — Hj)M\\ 2 is bounded by a universal 
constant times 

E teJ (E )SJ , a*a) 2 + E„. (E eJ a.a) 2 + E teJ «»a*ii 4 
+ (E i6J imiia.ii 2 ) 2 + (E 10 iia.ii 2 ) E teJ (E* a,a) 2 

^ 2 E teJ iiA t ii 2 (E^) 2 

(25) 

+* 2 (E teJ iiA.il 2 ) E fc ^(E*^) 2 ' 

7. Unknown Gaussian distribution. When \i and iv~ are unknown, we esti- 
mate them in the usual way: Jl n (t) = X n (t) = n~ 1 J2i< n ^ 

X (s, t) = (n - l)" 1 Y, i<n iMs) ~ (Xi(i) - X n (t)) 

= (n - I)" 1 (Z,( S ) - Z(s)) (Ut) - Z(t)) , 

which has spectral representation 
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In fact we must have 9 k = for k > n because all the eigenfunctions fa cor- 
responding to nonzero 9 k 's must lie in the n — 1 -dimensional space spanned by 
{Zi-Z:i = l,2,...,n}. 

The construction and analysis of the new estimator B will parallel the method 
developed in Section 5 for the case of known K and fj,. The quantities m and N 
are the same as before. We write H p (for p = N or p = m) for the operator 
that projects orthogonally onto spanj^i, . . . , fa}. Essentially we have only to es- 
timate all the quantities that appealed in the previous proof then show that none of 
the errors of estimation is large enough to upset analogs of the calculations from 
Section 5. There is a slight complication caused by the fact that we do not know 
which of ±<fij should be used to approximate fa. At strategic moments we will 
be forced to multiply by the matrix S := diag(cio, . . . , ctjv) with do = 1 and 
<7fc = sign((0fc, fa)) for k > 1. The results from Section 6 will control the differ- 
ence fk := a k fa — fa- The other key quantities are: 

(i) A := K - K 

(ii) 5 = diag(l, #7v) 1/2 

(iii) % = (z it i, z i)N )' where z i>k = (Zj, fa) 

(iv) z. = (z.i,..., z. N )' where z. k = (Z, fa) = nT x Y,i< n \k 

( v ) £» = (1) % — %'■) an d Vi = [We could define rji = D~ l £i but then we 
would need to show that D^ 1 ^ « D^ 1 ^. Our definition merely rearranges 
the approximation steps.] 

(vi) 7 := (70, bx, ■ ■ ■ ,b N )' where B = Y^ken^kfa and 70 := a + (B, X). [Note 
that Ai =7o + (B,Z i -Z).] 

(vii) \ N = 70 + (H N M, Zt - Z) = e-7- 

(viii) g = argmax g6R iv + i YU<n ViiCid) ~ and 

b = Qk<t>k- 

^l<fc<rri rW 

[Note that these two quantities differ from the g and B in Section 5.] 

(ix) A n = n- 1 Y.i< n m%^ 2) (\N) 

The use of estimated quantities has one simplifying consequence: 

Zi(t) - Z(t) = ~ z-k)Mt) 
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so that 



0k{j = k} 




which implies (re — 1) 1 ^ 



■i<n 



Ziz', = D 2 and 



(26) (re-l^V rj%rfi = D~ 1 D 2 D~ 1 := diag(l, 0i/0i, . . . , On /On)- 



We will analyze K by rewriting it using the eigenfunctions for K. Remember 
that Zij = (Zj, j) and the standardized variables rjij = Zij/y/Oj are indepen- 
dent N(0, l)'s. Define z.j = (Z, cf>j) and r/.j = n~ l Yli< n %j anc ^ 



a sample covariance between two independent N(0, I/v) random vectors. Then 



(27) if (s, t) = ^ fcgN (s)fo(t) with K i)fc = ^0~0~ k ^ k 



Moreover, as shown in Section 6, the main contribution to f k = a k 4>k — fik is 



In fact, most of the inequalities that we need to study the new B come from sim- 
ple moment bounds (Lemma 31) for the sample covariances Cj^ and the derived 
bounds (Lemma 32) for the A^'s. 

As before, most of the analysis will be conditional on the X$ 's lying in a set with 
high probability on which the various estimators and other random quantities are 
well behaved. 

LEMMA 28. For each e > there exists a set X e}H , depending on p, and K, 



and on which, for some constant C e that does not depend on [i or K, 
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supgrP n)At) #-Xg n < e for all large enough n 
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(i) \\A\\ < C.n' 1 ' 2 

(ii) maxj< n ||Zj|| < C e \J log n and ||Z|| < C e n -1 / 2 

(Hi) \\{H m - H m )M\\ 2 = 0y{ Pn ) 

(iv) ||(-ffiv — -£f/v)^l| 2 = 0-3r(n~ l ~ v ) for some v > that depends only on a 
and f3 

(v) maxj< n \rji\ 2 = o 3 r(- v /n/iV) 

(vi) \\SA n S - A n \\ 2 = oj(l) 

This Lemma (whose proof appears in Section 8) contains everything we need to 
show that ||B— B|| 2 has the uniform Oy{p n ) rate of convergence in P n j probability, 
as asserted by equation (3). In what follows, all assertions refer to the numbered 
parts of Lemma 28. 

As before, the component of B orthogonal to span{c/>i, . . . , <ft m } causes no trou- 
ble because 

||B-B|| 2 = ||?-7ll2 + ll#mBH 2 

and, by (iii), 

||i^B|| 2 < 2||^B|| 2 + 2\\(H m - H m )M\\ 2 = 7 (p n ) on X €>n . 

To handle \\g — j\\ 2 , invoke Corollary 9 for Xj's in X e ^ n , with rfc replaced by rji 
and A n replaced by A n and B n replaced by B n = SB n S, the same B n and D as 
before, and Q equal to 

Qn,a,B,7V = ®i<nQ\. „■ 

to get a set with Q n ,a,M,N^m,e < 2e ° n which II? - 7II2 = Of(pn)- The 
conditions of the Corollary are satisfied on X e>n , because of (v) and 

\\A n - B n \\ 2 < \\A n - SA n S\\ 2 + \\SA n S-SB n S\\ 2 = oy(l). 

To complete the proof it suffices to show that ||Qn, a ,B,7V — Qn,a,B,iv||TV tends to 
zero. First note that 

\n ~ h,N = a + (B, I) + {H N M, Zj - Z) - a - (B, p) - {H N M, Z;) 

= (H^M,Z) - (i?^B,Z) + (H^M,Z) + (H N M - H N M,Zi) 

which implies that, on X e ri , 

\\,N ~ Xi,N\ 2 < 2|(^B,Z)| 2 + 2\\H N M - H N M\\ 2 (\\Zi\\ + ||Z||) 2 

< O^N^Cln- 1 +0 7 (n- 1 ^)C7 2 (V 1 / 2 + /bg^) 2 
(29) = Oj{n^ v ') for some < v' < v. 
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Now argue as in subsection 5.2: on X ejn , 

< exp(o?(logn))y\ |Aj,Af - A^] 2 = oj(l). 



c 



Finish the argument as before, by splitting into contributions from X c n and X n ny 
and X n n y m>e . 

8. Proofs of unproven assertions from Section 7. Many of the inequalities 
in this section involve sums of functions of the 9j's. The following result will save 
us a lot of repetition. To simplify the notation, we drop the subscripts from F n:flj K ■ 

Lemma 30. 

(i) For each r > 1 there is a constant C r = C r (3~) for which 

« fc lr, 7 j - 2^ jeN iJ * fc i | fl . -0 k \r-\ Cl (l + k l+a ~"t log fe) if r = 1 



f n'J For eac/i p, 



. „ . . u-a-2/3 — a 



PROOF. For (i), argue in the same way as Hall and Horowitz (2007, page 85), 
using the lower bounds 



9j -e k \>< 



c a j- a i£j<k/2 
c a \j - k\k~ a ~ l i£k/2<j < 2k 
c a k~ a i£j>2k 



where c a is a positive constant. 

For (ii), split the range of summation into two subsets: { (k, j) : j > max(p, 2k)} 
and {(k,j) : p/2 < k < p < j < 2k}. The first subset contributes at most 

y k~ a ~ 2)3 Y r a {c a k~ a )- 2 = o 7 {p 1 - a ) 

^—^k<p - / j>max(p,2fc) 

because a — 2/3 < —3. The second subset contributes at most 

£ p/2< *</— 2 V* 2Q+2 E J>p r a u-kr 2 = os (p.p^p-«o { i)) , 

which is of order oy{p~ a ). □ 
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The distribution of 6^ does not depend on the parameters of our model. Indeed, 
by the usual rotation of axes we can rewrite (n— 1)6^ as U'jUk, where U±, U 2 , ■ ■ ■ 
are independent N(0, I n -i) random vectors. This representation gives some useful 
equalities and bounds. 

LEMMA 3 1 . Uniformly over distinct j, k, £, 
(i) FQjj = land¥ - l) 2 = 2(n - l)" 1 

(a) Fe- jk = ¥e jtk e jit = o 
(Hi) re 2 jk = o( n - 1 ) 
dv) pej fc e? >fc = to(n- 2 ) 

(v) PC 4 fc = 0(n- 2 ) 

PROOF. Assertion (i) is classical because \Uj\ 2 ~ Xn-i- F° r assertion (ii) use 
¥(U[U 2 I U 2 ) = Oand 

F(U[U 2 U^U 3 I U 2 ) = trace (U 2 U^(U 3 U[)) = 0. 

For (iii) use P(LW{) = I n -\ and 

F{U[U 2 U2U! I U 2 ) = trace [U 2 U' 2 ¥{U 1 U' l )) = tx&ce(U 2 U 2 ) = \U 2 \ 2 . 

For (iv) use IP| 1 4 = n 2 — 1 and 

P((C/^ 2 ) 2 (^C/ 2 ) 2 1 U 2 ) = \U 2 \ A 

For (v), check that the coefficient of i 4 in the Taylor expansion of 

Fexp(tU[U 2 ) = Pexp (^ 2 |C/i| 2 ) = (1 - t 2 )-^' 2 

is of order n 2 . □ 

Lemma 32. Uniformly over distinct j, k, £, 

(i) PA fcii = PA fcJ A w = 
(h) PA 2 ^ = {n-^r^ - Oj)- 2 ) 
(iii) PA 4 ' . = (n" 2 A:- 2 °i- 2 -(^ - ^)" 4 ) 
(ivj P||A fc || 2 = Oj(n- 1 A: 2 ) 
fvj P||A fc || 4 = J (n- 2 A: 4 ) 

PROOF. Assertions (i), (ii), and (iii) follow from Assertions (ii) and (iii) of 
Lemma 31. For (iv), note that 

P||A fc || 2 = ^*PA| )fe = GMn-^H^a) 
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For (v) note that 

p||A,|| 4 = p Wk& ith iPk - QjY 2 ) 2 

= ~^2 e 0j8i9l(e k - 9j)' 2 (9 k - e e )~ 2 FS 2 k s 2 k 

= o 7 { n ~ 2 ) Qr>A(^-^r 2 ) 2 

= 9 (n~ 2 k 4 ). 

□ 

To prove Lemma 28 we define X e ^ n as an intersection of sets chosen to make the 
six assertions of the Lemma hold, 

X^ri ' = ^A,n H 3^Z,n H X/Y,n H X^ n D X^^, 

where the complement of each of the five sets appearing on the right-hand side 
has probability less than e/5. More specifically, for a large enough constant C e , we 
define 

X A ,n = {||A|| < C e n-V 2 } 

Xz, n = {maxj< n ||Zj|| 2 < C e logn and ||Z|| < C e n~ l l 2 } 
X^™ = {maxj< n \rji\ 2 < C e N\ogn} as in Section 5 

= {|| V. . ?7i^||2 < Q^} 

The definition of X^ n , in subsection 8.3, is slightly more complicated. It is defined 
by requiring various functions of the A^'s to be smaller than C e times their expected 
values. 

The set Xa,u is almost redundant. From Definition 5 we know that 

min \9 j - 6U > (a/i?)^ 1_a and min 6 >• > R^N' 01 . 

l<j<j'<N l<j<N 

The choice iV ~ n ? with C < (2 + 2a)" 1 ensures that n 1 / 2 ^ 1- " -> oo. On X A>n 
the spacing assumption used in Section 6 holds for all n large enough; all the 
bounds from that Section are avaiable to us on X e n . In particular, 

max^jv \0j/ej - 1| < O y (iV Q ||A||) = o^(l). 

Equality (26) shows that Xa, u Q %A,n eventually if we make sure C e > 1. 
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8.1. Proof of Lemma 28 part (i). Observe that 

p ii A ii 2 = E i)fe p (^-^^' = A: >) 2 

= V. e j e k F(s j , k -{j = k}f 



<Y t o j Os(n- 1 ) + Y i .e i e k o^ 

— 3 — hk 



n 2 ) 



8.2. Proof of Lemma 28 part(ii). As before, Corollary 42 controls maxj< n ||Zj || 2 . 
To control the Z contribution, note that n||Z|| 2 has the same distribution as ||Zi|| 2 , 
which has expected value ^ JgN 9j < oo. 



8.3. Proof of Lemma 28 parts (Hi) and (iv). Calculate expected values for all 
the terms that appear in the bound (25) from Section 6. 

*w£^ {E J>P A ^T +^E J>P (£ fe </^) 2 
= Ek. E^„ (&? + & D b y Lemma 32 « 



I 1 (//• — , I 

-l^l-en 



(33) =0 J ( n - 1 p ) by Lemma 30 

and 

P»mEk bil|A,H 4 = OHn- 2 )E,< ^ 4 - 2/? = ? (n- 2 )(l+^ + logp 



and 



and 

P^Efc^H^II 2 = Oy(n-V) 

and 



^ E fc < p (E • a*a) = o^n- 1 ) £^ £* k-*r»-v{e k - 9 

(34) =Oj(n _1 ) by Lemma 30 
and 

^e^iia,ii 2 (e;^) 2 

(35) = CWn-'j' 2 ) fp 3 + log 2 j, 
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and 

(36) 2 

E^ (E* = W+ f "« by Le™,a30. 

For some constant C e = C e (3*), on a set %A,n with P n)At) .K-X A < e, each of the 
random quantities in the previous set of inequalities (for both p = m and p = N) 
is bounded by C t times its F n ^ t K expected value. By virtue of Lemma 32(iv), we 
may also assume that ||Afc|| 2 < C e k 2 /n on X\ >n . 

From inequality (25), it follows that on the set %A,n H Xa,^ for both p = m and 
P = N, 

\\(H p -H p )M\\ 2 

< Oj{n^p l - a ) + 0?(n~ 2 ) (l + p 5 " 2/3 + logp + p 6 ^ + log 2 p 
+ Oj(n-y ^(n" 1 ) + Oj{n- 2 ) (p 3 + /+ 20 - 2 / 3 l og 2 p 
+ (n~y )Oy(l + p^ 2 "" 2 / 3 l og 2 p) 

= 0- J {n- 1 p l - a ) \fp<N. 

This inequality leads to the asserted conclusions when p = m or p = N. 

8.4. Proof of Lemma 28 part (v). By construction, rjn = 1 for every i and, for 

3 > 2, 

y/djViJ = (zij ~ z.j) = (Zj - Z, 0j) 

Thus, for j > 2, 

crj^tj = 0J 1/2 (Zi - Z, <^ + fj) = m,j + \j 

with 

IM 2 < ^ (HZill + Pll) 2 II/.II 2 < ( j2+ ° re lQgn ) on X e , n . 
In vector form, 

(37) Srji = Vi + 5i with |^| 2 = ( ^— 2- j < o 9 (n/N 2 ) on X e>n . 

It follows that 

maxj< n \rji\ = max;< n \Srji\ < maxj< n \r]i\+o?(y/n/N) = Oj( v / n/A r ) on X e 
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8.5. Proof of Lemma 28 part (vi). From inequality (29) we know that 

67V := maxj< n |Aj,jv - \,n\ = Og-(n _(1+ ^' )/2 ) on X £in 

and from subsection 5.2 we have maxj< n IA^jv) = Oy{^J~\ogn). Assumption (tp3) 
in Section 3 and the Mean- Value theorem then give 

maxK„ \^ 2) {\n) - ip {2) {\i,N)\ < €Nip {2) (\i,N)G(e N ) = o ? (l). 

If we replace tp^H^i^) m me definition of A n by Lj := ^^(A^/v) we make a 
change T with 

||r|| 2 < o^(l)||(rz- l)" 1 ^.^ Vivih, 

which, by equality (26), is of order oy{X) on X £ira . 

From Assumption (ip2) we have c n := log maxj< n Li = og-(logn). Uniformly 
over all unit vectors u in R N+1 we therefore have 

u'SA n Su = Ogr(l) + (n - l)" 1 V Liu'(7/i + 5<)(7ft + £»)'« 

* — 't<n 

= oj(l) + (l + 0(n~ 1 ))n , J 4 n n 

+ (n- 1 ) Li {{u%) 2 + 2(n'r ?i )(« / ? i )) 

Rearrange then take a supremum over u to conclude that 

||Sl„S - A n \\ 2 < oy(l) + 5 (e Cn ) maxi< n (J?;] 2 + 2|$| |^|) 

Representation (37) and the defining property of X ViTl then ensure that the upper 
bound is of order oy(l) on X e>n . 

9. The minimax lower bound. We will apply a slight variation on Assouad's 
Lemma — combining ideas from Yu (1997) and from van der Vaart (1998, Sec- 
tion 24.3) — to establish inequality (2). 

We consider behavior only for p, = and a = 0, for a fixed K with spectral 
decomposition X^jeN^i'A? ® ^ or simplicity we abbreviate P n ,o,i<: to P. Let 
J = {m + 1, m + 2, . . . , 2m} and T = {0, 1} J . Let = Rj' 13 . For each 7 in T 
define B 7 = eX^eJ7?7% < Aj' f° r a smai l e > to be specified, and write Q 7 for 
the product measure ^i-cnQx^) with 

A» (7) = (B 7 ,Zj) = eV . cT ljPjZi,j- 

For each j let Tj = {7 G T : 7^ = 1} and let V>j be the bijection on T that flips 
the jth coordinate but leaves all other coordinates unchanged. Let n be the uniform 
distribution on T, that is, 7r 7 = 2~ m for each 7. 
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For each estimator B = 2~]jeN bj<t>j we have ||B 7 -B|| 2 > *£ jeJ (yPj -bj) 
and so 

su P P nJ ||B-B|| 2 > E 7er ^E ieJ P Q7 {eitfi "Vf 

= 2 ~ m E jeJ E 7 er, P + Q^W(° -%) 2 ) 

(38) > 2- m E jeJ E 7 er, IM^IIQt A Q^ll, 
the last lower bound coming from the fact that 

(ePj -bj) 2 + (0 -bj) 2 > \(ef3j) 2 for all bj. 
We assert that, if e is chosen appropriately, 

(39) mm j,7 lP||Q-y A Qt/>j(7) II stays bounded away from zero as n — > oo, 

which will ensure that the lower bound in (38) is eventually larger than a constant 
multiple of ^jeJ @j — c P n ^ or some constant c > 0. Inequality (2) will then 
follow. 

To prove (39), consider a 7 in T and the corresponding 7' = ipji^f). By virtue of 
the inequality 

||Q 7 A Qy || = 1 - ||Q 7 - Qy ||tv > 1 - (2 A ^ i<n h 2 {Q K{l) ,Q Ul >))) ^ 
it is enough to show that 

(40) limsup n ^max i)7 P (2 A J2 i<n ^(^(7), Qa^Y))) < L 

Define X n = {maxj< n ||Zj|| 2 < Co logn}, with the constant Co large enough that 
PX n = otl). On Xn we have 

IM 7 )| 2 < £. cT $lW = 0( Pn )logn = o(l) 

and, by inequality (4), 

h 2 (Q^),Q^ M ) < C?(l)|Ai(7) - < e 2 0?(l)(3 2 z 2 3 . 
We deduce that 

F ( 2 A E 4 < n /i2 WA l (7)^A l( y))) < 2PX£ + £.^e 2 ? (l)/3|PX„^ 

< o(l) + e 2 0(l)n/3 2 fl j . 

The choice of J makes /3 2 6*.,- < R 2 m~ a ~ 213 ~ i? 2 /n. Assertion (40) follows. 
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10. Hellinger distances in an exponential family. We need to show that 
h 2 (Qx, Qx+s) < 5 2 4>( 2 \\) (1 + \5\) G_(\6\) for all real A and 5. 

Temporarily write A' for A + 5 and A for (A + A')/2 = A + 5/2. 

l-\h 2 {Q x ,Q x ,) = f y / fx(y)fy(y) 

= j exp (Ay - ±V(A) - \l>{\')) 

= exp (ip(X) - i^(A) - ^(A')) 
>1 + ^(A)_ ^(A)-^(A') 

That is, 

h 2 (Qx, Qx>) < ^(A) + V(A + S)- 2V(A + 5/2). 
By Taylor expansion in 5 around 0, the right-hand side is less than 

l^(2) (A) + 1 S 3 ^(3) (A + ^ _ 1^(3) {x _ S * /2 j^ 

where < |<5*| < \S\. Invoke inequality (3) twice to bound the coefficient of 5 3 /6 
in absolute value by 

V^(A) (G(\S\) + lG(\S\/2)) < l^ 2 \X)G(\5\). 

The stated bound simplifies some unimportant constants. 

11. Bounds for Gaussian processes. As a consequence of defining property (K), 
the centered process Z := X — \i has an expansion Z(t) = J2keN ^f^kVk^kit) 
where the ry^'s are independent N(0, l)'s, implying 

ll Z " 2 = // S fc fc'eN ^ °k d k'VkVk'<Pk( t ) ( Pk'(s) dt ds = ^^hvl- 

Lemma 41. Suppose W% = Ylken r «,fc r ? J 2 fc/ or * = 1j • • • > n > where the fy^'s 
are independent standard normals and the Tn-'s are nonnegative constants with 
oo > T := maxj< n Y^keN T i,k- Then 

P{maxj< n Wi > 4T(log n + x)} < 2e~ x for each x > 0. 
PROOF. Without loss of generality suppose T = 1. For s = 1/4, note that 

P«p(*Wi) = n fceN (l - 2^)-^ < exp (E fceN ^^) < eV4 
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by virtue of the inequality — log(l — t)<2t for \t\ < 1/2. With the same s, it then 
follows that 

¥{maxi< n Wi > 4(logn + x)} 

< exp (— 4s(log n + x)) Pexp (maxj< n sW-i) 

<e~ x -Y^ Pexp(sWi). 

The 2 is just a clean upper bound for e 1 / 4 . □ 
Corollary 42. 

P n {max i < n ||Zi|| 2 > C"(logn + x)} < 2e~ x 

where C = AC J2ken k ~° < °°- 
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